Experiments with Arabic Topic Detection
نویسندگان
چکیده
The continuous growth of information on the Internet and the availability of a large mass of electronic documents in Arabic language make Natural Language processing (NLP) tasks play an important role to enhance and facilitate the access and the exploitation of information. Among available NLP tasks, we are interested in Arabic Topic Detection. Our objective is to realize an indexing system capable of identifying the general topics discussed in Arabic unvowelized documents. The proposed topic detection system of Arabic texts is based on Mutual Information for Topic Oriented Vocabulary (TOV) and classification according to Jaccard and adapted TF-IDF indicators. The experimental results are presented in terms of precision, recall and F1 measure evaluating the influence of factors such as: vocabulary length and morphological analysis on Arabic Topic Detection.
منابع مشابه
Traffic Scene Analysis using Hierarchical Sparse Topical Coding
Analyzing motion patterns in traffic videos can be exploited directly to generate high-level descriptions of the video contents. Such descriptions may further be employed in different traffic applications such as traffic phase detection and abnormal event detection. One of the most recent and successful unsupervised methods for complex traffic scene analysis is based on topic models. In this pa...
متن کاملUsing Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media
Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...
متن کاملTDT-2002 Topic Tracking at Maryland: First Experiments with the Lemur Toolkit
The University of Maryland submitted six topic tracking runs for the 2002 Topic Detection and Tracking evaluation. Two runs were produced using the Lemur language modeling toolkit, the remaining four were produced using an separate system coded in Perl. The Lemur runs outperformed the Perl runs on the required condition because term frequency information was better handled. Two of the Perl runs...
متن کاملMani’s Living Gospel: A New Approach to the Arabic and Classical New Persian Testimonia
In order to reconstruct the contents of the most famous work of Mani, Living Gospel (written originally in Syriac), we have to use the Arabic and Classical New Persian texts containing accounts and even indirect quotations of this book. One of the most remarkable points in these accounts is that they clearly show that an important part of the Living Gospel contains the Manicha...
متن کاملEvaluation of Topic Identification Methods on Arabic Corpora
Topic Identification is one of the important keys for the success of many applications. Indeed, there are few works in this field concerning Arabic language because of lack of standard corpora. In this study, we will provide directly comparable results of six text categorization methods on a new Arabic corpus Alwatan-2004. Hence, Topic Unigram Language Model (TULM), Term Frequency/Inverse Docum...
متن کامل